AITopics | medical model

Collaborating Authors

medical model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation

Zhang, Xiechi, Ouyang, Zetian, Wang, Linlin, de Melo, Gerard, Cao, Zhu, Wang, Xiaoling, Zhang, Ya, Wang, Yanfeng, He, Liang

arXiv.org Artificial IntelligenceMay-20-2025

With the proliferation of large language models (LLMs) in the medical domain, there is increasing demand for improved evaluation techniques to assess their capabilities. However, traditional metrics like F1 and ROUGE, which rely on token overlaps to measure quality, significantly overlook the importance of medical terminology. While human evaluation tends to be more reliable, it can be very costly and may as well suffer from inaccuracies due to limits in human expertise and motivation. Although there are some evaluation methods based on LLMs, their usability in the medical field is limited due to their proprietary nature or lack of expertise. To tackle these challenges, we present AutoMedEval, an open-sourced automatic evaluation model with 13B parameters specifically engineered to measure the question-answering proficiency of medical LLMs. The overarching objective of AutoMedEval is to assess the quality of responses produced by diverse models, aspiring to significantly reduce the dependence on human evaluation. Specifically, we propose a hierarchical training method involving curriculum instruction tuning and an iterative knowledge introspection mechanism, enabling AutoMedEval to acquire professional medical assessment capabilities with limited instructional data. Human evaluations indicate that AutoMedEval surpasses other baselines in terms of correlation with human judgments.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.11887

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment

Zhang, Yiming, Chang, Zheng, Cai, Wentao, Ren, MengXing, Yuan, Kang, Sun, Yining, Ding, Zenghui

arXiv.org Artificial IntelligenceJan-6-2025

Recent researches of large language models(LLM), which is pre-trained on massive general-purpose corpora, have achieved breakthroughs in responding human queries. However, these methods face challenges including limited data insufficiency to support extensive pre-training and can not align responses with users' instructions. To address these issues, we introduce a medical instruction dataset, CMedINS, containing six medical instructions derived from actual medical tasks, which effectively fine-tunes LLM in conjunction with other data. Subsequently, We launch our medical model, IIMedGPT, employing an efficient preference alignment method, Direct preference Optimization(DPO). The results show that our final model outperforms existing medical models in medical dialogue.Datsets, Code and model checkpoints will be released upon acceptance.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.02869

Country:

North America (0.28)
Asia > China (0.15)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Introducing the Large Medical Model: State of the art healthcare cost and risk prediction with transformers trained on patient event sequences

Sahu, Ricky, Marriott, Eric, Siegel, Ethan, Wagner, David, Uzan, Flore, Yang, Troy, Javed, Asim

arXiv.org Machine LearningDec-5-2024

With U.S. healthcare spending approaching $5T (NHE Fact Sheet 2024), and 25% of it estimated to be wasteful (Waste in the US the health care system: estimated costs and potential for savings, n.d.), the need to better predict risk and optimal patient care is evermore important. This paper introduces the Large Medical Model (LMM), a generative pre-trained transformer (GPT) designed to guide and predict the broad facets of patient care and healthcare administration. The model is trained on medical event sequences from over 140M longitudinal patient claims records with a specialized vocabulary built from medical terminology systems and demonstrates a superior capability to forecast healthcare costs and identify potential risk factors. Through experimentation and validation, we showcase the LMM's proficiency in not only in cost and risk predictions, but also in discerning intricate patterns within complex medical conditions and an ability to identify novel relationships in patient care. The LMM is able to improve both cost prediction by 14.1% over the best commercial models and chronic conditions prediction by 1.9% over the best transformer models in research predicting a broad set of conditions. The LMM is a substantial advancement in healthcare analytics, offering the potential to significantly enhance risk assessment, cost management, and personalized medicine.

healthcare cost, prediction, sequence, (15 more...)

arXiv.org Machine Learning

2409.13

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Data Science > Data Mining (0.88)

Add feedback

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Jeong, Daniel P., Garg, Saurabh, Lipton, Zachary C., Oberst, Michael

arXiv.org Artificial IntelligenceNov-19-2024

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper, we compare seven public "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting regime for medical question-answering (QA) tasks. For instance, across the tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 12.1% of cases, reach a (statistical) tie in 49.8% of cases, and are significantly worse than their base models in the remaining 38.2% of cases. Our conclusions are based on (i) comparing each medical model head-to-head, directly against the corresponding base model; (ii) optimizing the prompts for each model separately; and (iii) accounting for statistical uncertainty in comparisons. While these basic practices are not consistently adopted in the literature, our ablations show that they substantially impact conclusions. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies.

exact-match accuracy, mmlu, prompt format, (14 more...)

arXiv.org Artificial Intelligence

2411.04118

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.92)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

Jeong, Daniel P., Mani, Pranav, Garg, Saurabh, Lipton, Zachary C., Oberst, Michael

arXiv.org Artificial IntelligenceNov-13-2024

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper, we compare ten public "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting and supervised fine-tuning regimes for medical question-answering (QA). For instance, across all tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 22.7% of cases, reach a (statistical) tie in 36.8% of cases, and are significantly worse than their base models in the remaining 40.5% of cases. Our conclusions are based on (i) comparing each medical model head-to-head, directly against the corresponding base model; (ii) optimizing the prompts for each model separately in zero-/few-shot prompting; and (iii) accounting for statistical uncertainty in comparisons. While these basic practices are not consistently adopted in the literature, our ablations show that they substantially impact conclusions. Meanwhile, we find that after fine-tuning on specific QA tasks, medical LLMs can show performance improvements, but the benefits do not carry over to tasks based on clinical notes. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies.

dataset, prompt format, qa dataset, (13 more...)

arXiv.org Artificial Intelligence

2411.0887

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Health Care Providers & Services (0.67)
Health & Medicine > Health Care Technology > Medical Record (0.50)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

A Large Medical Model based on Visual Physiological Monitoring for Public Health

Huang, Bin, Zhao, Changchen, Liu, Zimeng, Hong, Shenda, Zhang, Baochang, Wang, Wenjin, Liu, Hui

arXiv.org Artificial IntelligenceApr-21-2024

The widespread outbreak of the COVID-19 pandemic has sounded a warning about the globalization challenges in public health. In this context, the establishment of large-scale public health datasets, of medical models, and of decision-making systems with a human-centric approach holds strategic significance. Recently, groundbreaking advancements have emerged in AI methods for physiological signal monitoring and disease diagnosis based on camera sensors. These approaches, requiring no specialized medical equipment, offer convenient manners of collecting large-scale medical data in response to public health events. Not only do these methods facilitate the acquisition of unbiased datasets, but also enable the development of fair large medical models (LMMs). Therefore, we outline a prospective framework and heuristic vision for a public health large medical model (PHLMM) utilizing visual-based physiological monitoring (VBPM) technology. The PHLMM can be considered as a "convenient and universal" framework for public health, advancing the United Nations' "Sustainable Development Goals 2030", particularly in its promotion of Universal Health Coverage (UHC) in low- and middle-income countries. Furthermore, this paper provides an outlook on the crucial application prospects of PHLMM in response to public health challenges and its significant role in the field of AI for medicine (AI4medicine). In summary, PHLMM serves as a solution for constructing a large-scale medical database and LMM, eliminating the issue of dataset bias and unfairness in AI models. The outcomes will contribute to the establishment of an LMM framework for public health, acting as a crucial bridge for advancing AI4medicine.

information, phlmm, visual physiological monitoring, (12 more...)

arXiv.org Artificial Intelligence

2406.07558

Country:

North America > United States (0.46)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Europe > San Marino > Fiorentino > Fiorentino (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM

Liu, Yihao, Zhang, Jiaming, She, Zhangcong, Kheradmand, Amir, Armand, Mehran

arXiv.org Artificial IntelligenceJun-20-2023

The advent of large language models (LLM) has led to significant progress in image analysis with potential for future advancements. SAM [Kirillov et al., 2023] is a revolutionary foundation model for image segmentation and has already shown the capability of handling diverse segmentation tasks. SAM especially prevails in zero-shot domain generalization cases compared with the existing elaborate, fine-tuned models trained on specific domains. An important prospect for the application of SAM would be its adaptation to the complex task of segmenting medical images with significant inter-subject variations and a low signal-to-noise ratio. The segmentation task allows separation of different structures in medical images, which are then used to detect the region of interest or reconstruct multi-dimensional anatomical models [Sinha and Dolz, 2021]. The existing AI-based segmentation methods, however, do not fully bridge the domain gap among different imaging modalities, such as computed tomography (CT), magnetic resonance imaging (MRI), or ultrasound (US) [Wang et al., 2020]. The domain gap refers to the difference in the data format across various image modalities, as each modality offers a distinct advantage in visualizing anatomical structures and related pathologies (e.g., tumor, bone fracture). This difference introduces specific challenges for training AI systems to perform common analysis without the need for a comprehensive dataset that includes all relevant domains from various image modalities.

large language model, medical model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.05622

Country: North America > United States > Maryland > Baltimore (0.06)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Definition drives design: Disability models and mechanisms of bias in AI technologies

Newman-Griffis, Denis, Rauchberg, Jessica Sage, Alharbi, Rahaf, Hickman, Louise, Hochheiser, Harry

arXiv.org Artificial IntelligenceNov-23-2022

The increasing deployment of artificial intelligence (AI) tools to inform decision making across diverse areas including healthcare, employment, social benefits, and government policy, presents a serious risk for disabled people, who have been shown to face bias in AI implementations. While there has been significant work on analysing and mitigating algorithmic bias, the broader mechanisms of how bias emerges in AI applications are not well understood, hampering efforts to address bias where it begins. In this article, we illustrate how bias in AI-assisted decision making can arise from a range of specific design decisions, each of which may seem self-contained and non-biasing when considered separately. These design decisions include basic problem formulation, the data chosen for analysis, the use the AI technology is put to, and operational design elements in addition to the core algorithmic design. We draw on three historical models of disability common to different decision-making settings to demonstrate how differences in the definition of disability can lead to highly distinct decisions on each of these aspects of design, leading in turn to AI technologies with a variety of biases and downstream effects. We further show that the potential harms arising from inappropriate definitions of disability in fundamental design stages are further amplified by a lack of transparency and disabled participation throughout the AI design process. Our analysis provides a framework for critically examining AI technologies in decision-making contexts and guiding the development of a design praxis for disability-related AI analytics. We put forth this article to provide key questions to facilitate disability-led design and participatory development to produce more fair and equitable AI technologies in disability-related contexts.

artificial intelligence, disability, natural language, (15 more...)

arXiv.org Artificial Intelligence

2206.08287

Country:

North America > Canada > Ontario > Hamilton (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(16 more...)

Genre: Research Report (0.64)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback